Getafix: Workload-aware Data Management in Lookback Processing Systems
نویسندگان
چکیده
In this paper, we target lookback processing systems (LPS), which allow queries to operate on segment-based historical data. We present new strategies that decide which segments should be placed, and how they should be replicated. Our approach leverages segment popularity. They are provably optimal in replication level, and thus memory and network overheads, for the static case. For the dynamic case, we present two heuristics. Our experiments show that the approaches improve memory and network utilization compared to existing strategies.
منابع مشابه
Getafix: Workload-aware Distributed Interactive Analytics
Distributed interactive analytics engines (Druid, Redshift, Pinot) need to achieve low query latency while using the least storage space. This paper presents a solution to the problem of replication of data blocks and routing of queries. Our techniques decide the replication level of individual data blocks (based on popularity, access counts), as well as output optimal placement patterns for su...
متن کاملVersatile workload-aware power management performability analysis of server virtualized systems
The widespread integration of virtualization technologies in data centers has enabled in the last few years several benefits in terms of operating costs and flexibility. These benefits maybe boosted through join optimization of power management (PM) and dependability for virtualized systems. This indeed involves developing appropriate models to better understand their performability behavior wh...
متن کاملPopular is Cheaper: Curtailing Memory Costs in Interactive Analytics Engines
This paper targets the growing area of interactive data analytics engines. We present a system called Getafix that intelligently decides replication levels and replica placement for data segments, in a way that is responsive to changing popularity of data access by incoming queries. We present an optimal solution to the static version of the problem, achieving minimality in both makespan and re...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملJET: Electricity cost-aware dynamic workload management in geographically distributed datacenters
The ever-increasing operational cost of geographically distributed datacenters has become a critical issue for cloud service providers. In order to cut the electricity cost of geographically distributed datacenters, several workload management schemes have been proposed, such as Electricity price-aware InteR-datacenter load balancing (EIR), which reduces the electricity cost of active servers b...
متن کامل